AITopics | agnostic learning

Collaborating Authors

agnostic learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Proper Agnostic Learning of Functions of Halfspaces under Gaussian Marginals

Tikhonov, Sergei, Vasilyan, Arsen

arXiv.org Machine LearningMay-28-2026

We study the problem of computationally efficient proper agnostic learning of multidimensional concept classes under the Gaussian distribution. In this setting, given i.i.d. labeled samples from an unknown distribution over $\mathbb{R}^d \times \{\pm 1\}$ whose marginal on $\mathbb{R}^d$ is Gaussian, the goal is to output a hypothesis from a target class $\mathcal{F}$ whose 0-1 loss is within $ε$ of that of the best classifier in $\mathcal{F}$. We give the first efficient proper agnostic learning algorithm for arbitrary Boolean functions of $K$ halfspaces under Gaussian marginals. Our algorithm runs in time $d^{O(K^2 \log(1/ε)/ε^2)} + (K/ε)^{O(K^3/ε^{2.5})}$. Prior to our work, the only known algorithm for $K \geq 2$ was brute-force search, with run-time exponential in $d$. Moreover, the dependence of our run-time on the dimension $d$ matches that of the best known improper learning algorithm, namely $d^{\widetilde{O}(K^2/ε^2)}$. For the special case of a single halfspace ($K=1$), the best previous run-time was $d^{O(1/ε^4)} + (1/ε)^{O(1/ε^6)}$. Our algorithm improves this to $d^{O(1/ε^2)} + (1/ε)^{O(1/ε^{2.5})}$. Once again, the dependence on $d$ matches that of the best known improper algorithm, namely $d^{O(1/ε^2)}$. Furthermore, the dependence of our run-time on the dimension $d$ is essentially optimal in the statistical query model.

artificial intelligence, halfspace, machine learning, (17 more...)

arXiv.org Machine Learning

2605.27594

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Swap Agnostic Learning, or Characterizing Omniprediction via Multicalibration

Neural Information Processing SystemsApr-28-2026, 18:31:10 GMT

We introduce and study Swap Agnostic Learning. The problem can be phrased as a game between a predictor and an adversary: first, the predictor selects a hypothesis h; then, the adversary plays in response, and for each level set of the predictor {x X: h(x) = v} selects a loss-minimizing hypothesis cv C; the predictor wins if p competes with the adaptive adversary's loss. Despite the strength of the adversary, our main result demonstrates the feasibility Swap Agnostic Learning for any convex loss. Somewhat surprisingly, the result follows by proving an equivalence between Swap Agnostic Learning and swap variants of the recent notions Omniprediction [15] and Multicalibration [20]. Beyond this equivalence, we establish further connections to the literature on Outcome Indistinguishability [6, 14], revealing a unified notion of OI that captures all existing notions of omniprediction and multicalibration.

artificial intelligence, machine learning, multicalibration, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Hardness of Online Sleeping Combinatorial Optimization Problems

Satyen Kale, Chansoo Lee, David Pal

Neural Information Processing SystemsMar-23-2026, 02:34:48 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

Add feedback

A Omitted Proofs

Neural Information Processing SystemsFeb-15-2026, 11:30:09 GMT

Taking = p / gives the desired claim. Claim 2.7, we know that the multicalibration violation for The inequalities follow by Holder's inequality and the assumed bound on the weight of Recall that Cov[ y, z ]= E [ yz ] E [ y ] E [ z ] . Here, we give a high-level overview of the MCBoost algorithm of [ 20 ] and weak agnostic learning. Algorithm 2 MCBoost Parameters: hypothesis class C and > 0 Given: Dataset S sampled from D Initialize: p ( x) 1 / 2 . By Lemma 3.8, we know that In this Appendix, we give a full account of the definitions and results stated in Section 4 .

artificial intelligence, loss oi, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

Swap Agnostic Learning, or Characterizing Omniprediction via Multicalibration

Neural Information Processing SystemsFeb-15-2026, 11:30:06 GMT

We introduce and study Swap Agnostic Learning.

artificial intelligence, machine learning, multicalibration, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

BeyondPerturbations: LearningGuaranteeswith ArbitraryAdversarialTestExamples

Neural Information Processing SystemsFeb-9-2026, 23:28:46 GMT

Inparticular,forany function in a classC of bounded VC dimension, we guarantee a low test error rate and a low rejection ratewith respect toP. Our algorithm is efficient given an Empirical Risk Minimizer (ERM) forC.

artificial intelligence, machine learning, test example, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > California (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.48)

Add feedback

Learning versus Refutation in Noninteractive Local Differential Privacy

Neural Information Processing SystemsFeb-9-2026, 19:59:42 GMT

The definition of agnostic learning above is classical.

artificial intelligence, machine learning, refutation, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.15)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report (0.68)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.68)
Information Technology > Security & Privacy (0.68)

Add feedback

Agnostic Learning with Multiple Objectives

Neural Information Processing SystemsDec-24-2025, 20:32:39 GMT

Most machine learning tasks are inherently multi-objective. This means that the learner has to come up with a model that performs well across a number of base objectives $\cL_{1}, \ldots, \cL_{p}$, as opposed to a single one. Since optimizing with respect to multiple objectives at the same time is often computationally expensive, the base objectives are often combined in an ensemble $\sum_{k=1}^{p}\lambda_{k}\cL_{k}$, thereby reducing the problem to scalar optimization. The mixture weights $\lambda_{k}$ are set to uniform or some other fixed distribution, based on the learner's preferences. We argue that learning with a fixed distribution on the mixture weights runs the risk of overfitting to some individual objectives and significantly harming others, despite performing well on an entire ensemble. Moreover, in reality, the true preferences of a learner across multiple objectives are often unknown or hard to express as a specific distribution. Instead, we propose a new framework of \emph{Agnostic Learning with Multiple Objectives} ($\almo$), where a model is optimized for \emph{any} weights in the mixture of base objectives. We present data-dependent Rademacher complexity guarantees for learning in the $\almo$ framework, which are used to guide a scalable optimization algorithm and the corresponding regularization.

artificial intelligence, machine learning, proceedings, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Agnostic Learning of a Single Neuron with Gradient Descent

Neural Information Processing SystemsDec-23-2025, 23:11:28 GMT

We consider the problem of learning the best-fitting single neuron as measured by the expected square loss $\E_{(x,y)\sim \mathcal{D}}[(\sigma(w^\top x)-y)^2]$ over some unknown joint distribution $\mathcal{D}$ by using gradient descent to minimize the empirical risk induced by a set of i.i.d.

agnostic learning, name change, single neuron, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Smoothed Agnostic Learning of Halfspaces over the Hypercube

Kou, Yiwen, Meka, Raghu

arXiv.org Machine LearningNov-25-2025

Agnostic learning of Boolean halfspaces is a fundamental problem in computational learning theory, but it is known to be computationally hard even for weak learning. Recent work [CKKMK24] proposed smoothed analysis as a way to bypass such hardness, but existing frameworks rely on additive Gaussian perturbations, making them unsuitable for discrete domains. We introduce a new smoothed agnostic learning framework for Boolean inputs, where perturbations are modeled via random bit flips. This defines a natural discrete analogue of smoothed optimality generalizing the Gaussian case. Under strictly subexponential assumptions on the input distribution, we give an efficient algorithm for learning halfspaces in this model, with runtime and sample complexity approximately n raised to a poly(1/(sigma * epsilon)) factor. Previously, such algorithms were known only with strong structural assumptions for the discrete hypercube, for example, independent coordinates or symmetric distributions. Our result provides the first computationally efficient guarantee for smoothed agnostic learning of halfspaces over the Boolean hypercube, bridging the gap between worst-case intractability and practical learnability in discrete settings.

algorithm, approximation, halfspace, (17 more...)

arXiv.org Machine Learning

2511.17782

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback